How frequent are song repetitions in the same playlist?

In [1]:
import pandas as pd
import gc
from bokeh.charts import Histogram, BoxPlot, output_file, show
from bokeh.charts import output_notebook
output_notebook()
/home/tales/anaconda3/lib/python3.5/site-packages/bokeh/util/deprecation.py:34: BokehDeprecationWarning: 
The bokeh.charts API has moved to a separate 'bkcharts' package.

This compatibility shim will remain until Bokeh 1.0 is released.
After that, if you want to use this API you will have to install
the bkcharts package explicitly.

  warn(message)
Loading BokehJS ...
In [2]:
df = pd.read_csv("../../../data/play_track.csv", sep=";")

Counting the songs for each playlist

In [3]:
playlist_len = df.groupby("pid").apply(len)

Removing duplicate songs from the same playlist and then counting

In [4]:
unique_pid_track = df[["pid", "track_uri"]].drop_duplicates()
playlist_unique_len = unique_pid_track.groupby("pid")["track_uri"].apply(len)
unique_pid_track = None
df = None
gc.collect()
Out[4]:
133

Calculating the number of repeated songs for each playlist

In [5]:
diff = (playlist_len - playlist_unique_len)
prop = diff / playlist_len
In [6]:
"Proportion of playlists with repeated songs: {}".format((diff > 0).sum() / len(diff))
Out[6]:
'Proportion of playlists with repeated songs: 0.292851'
In [7]:
p = Histogram(playlist_len, title="Playlist length", width=600, height=300, tools=["save", "xpan", "xwheel_zoom"])
show(p)
In [8]:
p = Histogram(diff[diff > 0], title="Repeated songs by playlist", width=600, height=300, tools=["save", "xpan", "xwheel_zoom"])
show(p)
In [9]:
p = Histogram(prop[prop > 0], title="Repeated songs by playlist", width=600, height=300, tools=["save", "xpan", "xwheel_zoom"])
show(p)
In [11]:
prop.describe()
Out[11]:
count    1000000.000000
mean           0.010037
std            0.028243
min            0.000000
25%            0.000000
50%            0.000000
75%            0.009346
max            0.980000
dtype: float64

  • Almost one third of the list have repeated songs.

  • 75% of the playlists with any repeated song have almost 1% of its songs repeated.

  • Tales Pimentel
    tales.tsp@gmail.com